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Unbinned likelihood fits are frequent in Physics, and often involve complex functions with several components. 
We discuss the potential pitfalls of situations where the templates used in the fit are not fixed but depend on the 
event observables, as it happens when the resolution of the measurement is event-dependent, and the procedure 
to avoid them. 



When several categories of events are present in the 
same data sample, an unbinned Maximum Likelihood 
fit is often used to determine the proportion and the 
properties of each class of events. This procedure 
makes use of "templates" , representing the probability 
distribution of the observables used in the fit for each 
class of events. In the simplest cases the templates 
are completely determined by the values assigned to 
the parameters of the fit, but frequently a more so- 
phisticated approach is chosen where templates vary 
on an event by event basis, according to the resolution 
of the measurement for that particular event. These 
variations are due to the dependence of resolution on 
extra variables, that change on an event-by-event ba- 
sis . This may happen, for instance, when events are 
recorded by a detector that has different resolutions 
in different regions within its acceptance. 

A common example of this kind of fit in HEP is 
given by lifetime and/or mass fits (see Q for a sample 
list of recent experimental papers), where variations 
in resolution occur as a consequence of different con- 
figuration of each individual decay. The same kind of 
issue hovewer is likely to arise in other situations. 

The purpose of this short paper is to point out some 
potential pitfalls in this kind of fitting procedure. I 
will illustrate the point with reference to a simple toy 
problem. 



1. A toy problem 

Consider an experiment in which two types of 
events, A and B, can occur. Let / be the fraction 
of type-A events, that is, the probability of a generic 
event to be of type A. We want to extract a measure- 
ment of / from a given sample of data. In order to do 
this, we measure the value of an observable x, having 
the following probability distributions: 



p{x\A) 
p{x\B) 



N{0,a) 

N{1,<7) 



Where cr is a known constant and N{ii,a) is the nor- 
mal distribution 

This problem is easily solved using an "unbinned 
Likelihood fit" . This consists of maximizing the Like- 



lihood function: 



with respect to the required parameter / (here 
N{x, ^, cr) indicates the gaussian function in the vari- 
able x). This is very simple to perform with the help 
of a numerical maximization program. 

Let's make a specific numeric example, where / = 
1/3 and a — 1 (see illustration in Fig. and the 
size of the data sample is 150 events. By repeteadly 
generating MC samples of 150 events each, we obtain 
the distribution of the Maximum Likelihood estimator 
of /, which is shown in Fig. [3 




Figure 1: Probability distribution of x for the toy 
problem described in the text. Contribution of type-A 
and type-B events are also shown. 



Its mean is 0.3368 ± 0.0041 and SD = 0.083, in 
agreement with expectations of 0.3333 and 0.088 re- 
spectively (the latter coming from Fisher information 



)■ 




Figure 2: Distribution of ML estimate of the fraction / 
of type-A events (see text) 
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2. A toy problem, with variable resolution 

Let's now suppose that the resolution of x is not 
constant, but rather depends on the event: we are 
assuming that each event Xi conies together with an 
individual value of a (let it be Gi). This situation 
is encountered in many real-life problems, and the 
common approach found in the literature is to simply 
modify the Likelihood function as follows: 

L{f) - n /^(^»' 0' + (1 - /)^(^»' 1' ^0) (2) 

i 

This looks like a pretty obvious generalization of ex- 
pression To test it in our toy problem, we mod- 
ified our toy MC from previous example, by making 
a fluctuate at each event within an arbitrarily chosen 
range (1.0 to 3.0), and again made repeated simulated 
experiments of 150 events each, maximizing the Like- 
lihood expression (jSJ to estimate /. The result of this 
test is shown in Fig.|2| and rather surprisingly, shows 
a very large bias with respect to the true value of /. 
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Figure 3: Distribution of ML estimate of the fraction / 
of type-A events, obtained from a "conditional 
Likelihood" 

This may seem really odd, until one realizes that 
this new problem is very different from the previous 
one. Our problem now has actually two observables: 
each observation consists of the pair of values {xi, <7i) 
rather than just x^, and its probability density de- 
pends on both. This means that the Likelihood must 
now be written based on the probability distributions 
of the [xi^ (Ji) pair. 

L{f) = n + (1 ~ '^'i^) 

■i 

Remembering that p{xi,ai\X) = p{xi\ai, X)p{ai\X) 
we can write the correct expression of the Likelihood 
for our problem as: 

L{f) ^\{fN{x,,Q,<j,)p{cT,\A) 

i 

+ (l-/)7V(a;„l,(7,MfT,|B) (4) 

where p{ai\X^ is the pdj oi ai for events of type X, an 
element that was absent in eq. in fact, comparing 
the two expressions shows that Q is actually the con- 
ditional probability distribution p{xi\ai^ /) (one might 
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call it "conditional Likelihood") rather than the full 
distribution p{xi^ <^i\f)- The difference matters for fit- 
ting unless it happens that the distribution of ai is the 
same for all types of events: p(<Ji\A) — p{ai\B). In 
that case, p{ui) can be factored out, and the incom- 
plete Likelihood of eq. 10 differs from the true Likeli- 
hood just by a factor independent of /, that docs not 
affect the maximization. 

In the specific MC test reported above, we simu- 
lated a resolution 1.5 times worse for events of type 
B than for type A, setting the oi distribution as flat 
between 1 and 2 for A-type events, and flat between 
1.5 and 3 for type-B events. We intentionally avoided 
saying this explicitly before, in order to put the reader 
in the typical situation encountered in reality, where 
no attention is payed to the distribution of those reso- 
lutions for the different classes of events considered in 
the flt. It turns out from our example that this may 
lead to very biased results. 

In summary, expression |2Jl simply does not work for 
fitting, and by a large amount: it can be said to belong 
to that particular class of solutions nicely defined in 
as 'SNW solutions'. 

Conversely, if we use in fitting the correct expres- 
sion of the Likelihood (eq. ^ we get the result shown 
in fig. 0] showing a negligible bias. The resolution 
of the fit is also much better, as the difference in the 
distributions of the a themselves gets exploited in sep- 
arating the two samples; this however is a minor point 
in comparison with the bias issue. 
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Figure 4: Distribution of ML estimate of the fraction / 
of type-A events, using the full Likelihood function 



3. Additional tests 

One may wonder at what features of the distribu- 
tions make for a large bias. Table shows results for 
a few variants of the original problem. Tests include: 

• Equal- width ranges of a . 

• Disjoint a ranges. 

• Constant, but different a for A and B. 

• Constant, and close ct's for A and B. 

• Same- mean a distribution with different widths. 
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Table I Results of MC fitting tests. 


Resolutions 


"conditional" L j2j 
fA aiU) 


True Likelihood 

fA ^(Ia) 


1.0 1.0 

[1.0,2.0] [1.5,3.0] 
[l.U, z.uj [J-.o, z.o\ 
[1.0,2.0] [2.0,3.0] 
1.0 2.0 
1.0 1.1 
[0.5,3.5] [1.5,2.5] 
1.0 [1.0,2.0] 


0.336 ± 0.003 0.08 
0.514 ±0.007 0.14 

n /1 7/1 -i- n niTz a i a 
U.4(4±U.UU/ U.i4 

0.579 ±0.008 0.15 

n c^c; _L n nrif? n i o 

U.UI'J ^ U.UUU U.-L.^ 

0.374 ± 0.004 0.08 
0.330 ±0.006 0.12 
0.482 ± 0.009 0.09 


0.335 ± 0.002 0.03 
U.ooo it U.UUz U.Uvj 
0.333 ±0.000 0.00 
n QQQ _L n nnn n 00 

Kj.OOO u.uuu u.uu 

0.333 ±0.000 0.00 
0.332 ±0.002 0.03 
0.333 ±0.000 0.00 


{a A actually = 1.) 


modified L 13 


True Likelihood 


1.0 [1.0,2.0] 
[0.5,3.5] [1.0,2.0] 


0.374 ±0.004 0.08 
0.414 ±0.004 0.08 


0.333 ±0.000 0.00 
0.332 ± 0.003 0.03 



• Only one type of events has variable sigma. 

In almost every tried situation we found expres- 
sion ^ to return largely biased results. The exception 
occurs when the average a is the same; the resolution 
on / is however much worse than with the correct ex- 
pression. It looks like the most important element is 
the difference between the average values of a for the 
different samples; the actual variability within each 
sample seems less important. 

A simpler situation exists, that is pretty common in 
practice, where one has just one signal component over 
a background, and the signal distribution contains a 
variable sigma, while the background is represented 
just by a fixed template. In this case, expression |j2Jl 
becomes: 

LU) = n /^(^*' 0' 1) + (1 - 1' ^0) (5) 

i 

This expression of L is of course still incorrect, but 
it better describes reality at least for one of the two 
event categories by incorporating explicitly the infor- 
mation that it has a fixed sigma. Here a variable tem- 
plate appears just in one component, and being this 
the simplest configuration with a variable template, 
it is interesting to ask whether it yields a reasonable 
approximation of the correct results. 

If we apply this new Likelihood expression to the 
last tested case, (cr^ = 1.0 and ctb e [1.0,2.0]), we 
find that the result is still biased, although to a lesser 
extent (Tab. Q}. This shows that the distribution 
of a must be kept into account even in the simplest 
situation, where it appears in only one component of 
the fit. 

The mechanism underlying this problem is easier to 
see by looking at a variant of the previous case. Sup- 
pose that resolutions are the same as above, but for 
events of type-A the variable at is distributed over a 
wide range (0.5-3-5); this is not the actual value of 
the resolution for those events, that is still fixed at 1, 
so for type-A events it represents just an additional 



meaningless number. This is a definite possibility in 
a real case, where the nature of type-A events may 
be so different from type-B to produce meaningless 
values for the resolution estimator Ci, that was de- 
signed to work for type-B events - remember that 
the distribution of A is given as fixed. Note that the 
expression used © does know that much, and cor- 
rectly disregards the value of ct^ in the A hypothesis. 
For events of type B, the variable cr^ correctly repre- 
sents the sigma, event by event, of the observable x, 
and the L function correctly accounts for this, too. 

It may come as a surprise that the result is largely 
biased. The reason for this rather spectacular failure 
is that the second piece of L, related to B-type events, 
gets confused by the presence of the events of type-A 
with meaningless values of sigma: they unavoidably 
enter both terms of L during the calculation. The 
conclusion is: whenever you include Ui in your Likeli- 
hood expression, even for just one class of events, you 
must also account for its distribution, and you must 
do so for all event classes. 



4. Conclusions 

Whenever the templates used in a multi-component 
fit depend on additional observables, one should al- 
ways use the correct, complete Likelihood expression 
101), including the explicit distributions of all observ- 
ables for all classes of events. This is necessary even 
if just one of the components is based on a variable 
a. The simpler expressions that are commonly used 
should be considered unreliable unless one can show 
that the distribution of the variable a is the same for 
all components. 

A more general consideration suggested by the ex- 
amples discussed above is that one should always be 
wary of "intuitive" modifications of a Likelihood func- 
tion. For every given problem there is only one correct 
expression for the Likelihood (up to a multiplicative 
constant factor), and it is crucial to verify in every 
case that the expression used is the right one, rather 
than rely on intuition. 
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